Tue.O5d.04 Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
نویسندگان
چکیده
This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstrum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate over-smoothing of the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combined with the trained acoustic model to determine the optimal spectral features at synthesis time. In this paper, we extend this method to the condition where mel-cepstral coefficients are used as spectral features. Further, a method of integrating LPS-GV distortions into the criterion of minimum generation error (MGE) model training is proposed in order to avoid high computational complexity of the parameter generation algorithm with GV model. Experimental results show that the parameter generation algorithm using LPS-GV model produces more natural acoustic features than the conventional GV modeling method when mel-cepstrum features are adopted. Besides, integrating LPS-GV distortions into model training criterion achieves similar performance as applying LPS-GV model at synthesis time.
منابع مشابه
Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstrum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate over-smoothing of the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combi...
متن کاملProsody control in HMM-based speech synthesis
In HMM-based speech synthesis, trained statistical models (context-dependent HMMs) are used to predict duration and generate parameters like mel-cepstral coefficients, log F0 values, and bandpass voicing strengths using the maximum likelihood parameter generation algorithm including global variance (Toda et al, 2007). In the later stages, F0 parameters, bandpass voicing strengths, and the five ...
متن کاملGlobal variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis
This paper presents a method to model the global variance (GV) of log power spectrums derived from the line spectral pairs (LSPs) in a sentence for HMM-based parametric speech synthesis. Different from the conventional GV method where the observations for GV model training are the variances of spectral parameters for each training sentence, our proposed method directly models the temporal varia...
متن کاملAnalysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis
This paper presents a study on the importance of shortterm spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)based emotion identification. Two known forms of parameterization for the short-term...
متن کاملAdvances in Spectral Parameterization for Statistical (HMM-Based) TTS
HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...
متن کامل